Measuring the Impact of Diagnostic Decision Support on the Quality of Clinical Decision Making: Development of a Reliable and Valid Composite Score
نویسندگان
چکیده
Design: Sets of differential diagnoses and clinical management plans generated by 71 clinicians for six simulated cases, before and after decision support from a Web-based pediatric differential diagnostic tool (ISABEL), were used. Measurements: A composite quality score was calculated separately for each diagnostic and management plan by considering the appropriateness value of each component diagnostic or management suggestion, a weighted sum of individual suggestion ratings, relevance of the entire plan, and its comprehensiveness. The reliability and validity (face, concurrent, construct, and content) of these two final scores were examined. Results: Two hundred fifty-two diagnostic and 350 management suggestions were included in the interrater reliability analysis. There was good agreement between raters (intraclass correlation coefficient, 0.79 for diagnoses, and 0.72 for management). No counterintuitive scores were demonstrated on visual inspection of the sets. Content validity was verified by a consultation process with pediatricians. Both scores discriminated adequately between the plans of consultants and medical students and correlated well with clinicians’ subjective opinions of overall plan quality (Spearman r 0.65, p , 0.01). The diagnostic and management scores for each episode showed moderate correlation (r = 0.51). Conclusion: The scores described can be used as key outcome measures in a larger study to fully assess the value of diagnostic decision aids, such as the ISABEL system. j J Am Med Inform Assoc. 2003;10:563–572. DOI 10.1197/jamia.M1338. Many computerized systems have been developed to assist physicians during diagnostic decision making (DDSS). Although the benefits of providing diagnostic decision support in clinical practice have been closely examined, few studies have been able to convincingly show changes in physician behavior or improved patient outcomes resulting from the use of DDSS. This may have occurred for two reasons: first, the precise manner and clinical setting in which DDSS might help a physician remain unclear—as an ‘‘oracle’’ in the uncommon clinical scenario of a diagnostic dilemma or as a simple diagnostic reminder system in routine clinical practice. Second, as a consequence of this lack of clarity, a number of heterogeneous outcome measures have been used to quantify the clinical benefits of DDSS. Early studies expected the DDSS to be able to predict the ‘‘correct’’ diagnosis in a diagnostic dilemma. This was the ‘‘Greek oracle’’ model in which the user remained a passive recipient of DDSS advice. These studies examined the ‘‘diagnostic accuracy’’ of the system functioning in isolation. A binary metric was commonly used—the system was accurate if it displayed the ‘‘correct’’ diagnosis and inaccurate if it did not. More sophisticated measures of system Affiliations of the authors: Department of Paediatrics, St. Mary’s Hospital, London, England (PR, MC, JFB); Department of Paediatrics, Princess Alexandra Hospital, Essex, England (RRK); Department of Paediatrics, Watford General Hospital, Watford, England (VN); ISABEL Medical Charity, London, England (ALT); Centre for Health Informatics and Multiprofessional Education, London, England (PMT); Klinische Informatiekunde (KIK), Academic Medical Centre, Amsterdam, The Netherlands (JCW). The authors thank Jason Maude of the ISABEL Medical Charity, Dr. C. Edwards, and Dr. T. Sajjanhar for their ideas and support in the development of the scoring system. This study was supported by an evaluation grant from the National Health Service Research and Development Department (NHS R&D;), London. An abstract of the scoring procedure was presented at the Paediatric Education Section of the Royal College of Paediatrics and Child Health Annual Meeting, York, UK, April 2003. Dr. Joseph Britto is a Trustee and Medical Adviser of the ISABEL Medical Charity (nonremunerative post). Amanda Tomlinson works for the ISABELMedical Charity full time as research nurse. Correspondence and reprints: Joseph F. Britto, MD, Department of Paediatric Intensive Care, 7th Floor, St. Mary’s Hospital, South Wharf Road, London W2 1NY, England; e-mail: . Received for publication: 01/29/03; accepted for publication: 05/15/03. 563 Journal of the American Medical Informatics Association Volume 10 Number 6 Nov / Dec 2003 by gest on Jauary 8, 2016 ht://jam ia.oxfournals.org/ D ow nladed from performance proposed by Berner et al. also studied the ranking of diagnostic hypotheses in a system’s list and other discrete indicators of diagnostic quality, such as relevance and comprehensiveness, generated by comparing the DDSS diagnostic hypothesis set to a ‘‘gold standard’’ set generated by expert clinicians. Subsequent evaluations of DDSS deemphasised the value of testing the system only, and focused on examining the impact of a DDSS on the user’s diagnostic plans. This reflected the belief that the clinician would serve as an active cognitive filter of DDSS advice rather than remain a passive user during system consultation in real life. In this setting, it was not essential that the system possessed a high degree of diagnostic accuracy, so long as its suggestions positively influenced users’ diagnostic reasoning; the clinical impact of a DDSS was assessed by measuring changes in the diagnostic quality of the clinician to whom decision support was provided. Friedman et al.described a composite score for this purpose. However, in general terms, most scoring schemes aimed to objectively measure the same concept— the quality of a diagnostic hypotheses plan—irrespective of whose efforts it represented (system or user). In Berner’s study, numerous discrete indicators of quality were used; in Friedman’s study, a single composite score was used. As part of the assessment of impact of a free, Web-based differential diagnostic aid on clinical reasoning in an acute pediatric setting (ISABEL, , ISABEL Medical Charity, UK), we sought an instrument to measure the quality of initial clinical assessment, consisting of diagnostic and management plans. The impact assessment was planned in two stages—a simulated study followed by a real life clinical trial; methods validated during the simulation could be used successfully in the clinical trial. ISABEL utilizes unformatted electronic, natural language text descriptions of diseases derived from standard textbooks as the underlying knowledge base; 3,500 disease descriptions are represented in its database. Commercial textual patternrecognition software (Autonomy, ) searches the underlying knowledge base in response to clinical features input in free text and displays diseases with matching textual patterns arranged by body system rather than in order of clinical probability. Thus, ISABEL functions primarily as a reminder tool to suggest 8 to 10 diagnostic hypotheses to clinicians, rather than acting as an ‘‘oracle.’’ During initial system performance evaluation, the correct diagnosis formed part of the ISABEL reminder list on greater than 90% of occasions, and all diagnoses judged to be appropriate for each case by an expert panel were displayed in 73% of cases (data awaiting publication). Berner’s comprehensiveness score (proportion of appropriate diagnoses, as judged by the panel, included in the diagnostic plan under examination) applied to ISABEL in this study was 0.82. For the purposes of our simulated impact evaluation, many scoring systems were considered as candidate outcome measures. We needed a composite score that took into account all pertinent factors that contributed to the quality of a diagnostic and management plan. Berner’s comprehensiveness score as well as relevance score (proportion of suggestions in the diagnostic plan that the panel found reasonable to consider, including retrospectively) were not considered suitable: suggestions were not weighted based on how reasonable or appropriate they were (one highly appropriate suggestion and another less appropriate suggestion contributed the same value to the score). Friedman’s composite score conceptualized diagnostic quality as having two primary components: a plausibility component derived from ratings of each individual diagnosis in a set (whether ‘‘correct’’ or ‘‘incorrect’’) and a location component derived from the location of the ‘‘correct’’ diagnosis if contained in the set. The composite score could not be computed without knowledge of a single ‘‘correct’’ diagnosis, which is usually not available in the acute medical setting for which ISABEL was designed. In this setting, an initial diagnostic plan is often generated with a dataset that includes only clinical history, an initial examination, and sometimes results from a set of ‘‘first-pass’’ investigations. It would be difficult to assign, or even expect, a single ‘‘correct’’ diagnosis at this stage, the emphasis being on considering the most appropriate set of diagnoses (‘‘high frequency, highly plausible, high impact’’). The location component of Friedman’s score limited the maximum number of diagnostic suggestions to 6, making it difficult to assess comprehensiveness in this setting (.6 ‘‘appropriate’’ gold standard diagnoses may be appropriate for some cases depending on the level of uncertainty at initial assessment). In addition, in the Friedman score, although each case might have numerous highly plausible suggestions, a list containing only the ‘‘correct’’ diagnosis was assigned the highest score— comprehensiveness was not rewarded. The plausibility component, where appropriate suggestions are weighted on a 0 to 7 scale by the expert panel, could not be used without modification; the appropriateness of a diagnosis may be based on its plausibility, its likelihood, as well as implications for further test ordering. Both scores also did not have a mechanism to measure the quality of management plans; we felt that the full clinical impact of any DDSS would be manifested in changes engendered in physician’s diagnostic plans as well as in real changes made to the patient’s treatment. The measurement of a DDSS’ clinical impact had to be undertaken from two separate but related views: (1) changes in diagnostic plan and (2) changes in management plan. Since these changes were generated purely as a result of the provision of diagnostic support, this concept is different from measuring the clinical impact of systems primarily intended to provide decision support for test ordering, antibiotic prescription, and critiquing patient management or integrated hospital information systems that offer advice on all of these functions. Such systems also offered appropriate advice on relatively narrow areas of decision making such as antibiotic choice in infections or the management of hypertension, in which outcome selection was simpler. This study describes the development of a scoring metric for diagnostic and management plan quality and an examination of some of its measurement properties.
منابع مشابه
Research Paper: Measuring the Impact of Diagnostic Decision Support on the Quality of Clinical Decision Making: Development of a Reliable and Valid Composite Score
OBJECTIVE Few previous studies evaluating the benefits of diagnostic decision support systems have simultaneously measured changes in diagnostic quality and clinical management prompted by use of the system. This report describes a reliable and valid scoring technique to measure the quality of clinical decision plans in an acute medical setting, where diagnostic decision support tools might pro...
متن کاملProduct Development Decision Support System Customer-Based
Quality Function Deployment (QFD) has been traditionally used as a planning tool primarily for product development and quality improvement. In this context, many people have used QFD for making decisions on how to prioritize critical product areas from a customer perspective. However, it is the position of the author that the QFD process can be viewed as a decision support system that would enc...
متن کاملKnowledge and Clinical Decision-Making of Nursing Students of Guilan University of Medical Sciences in Use of Blood Transfusion in Pediatric Nursing
Abstract Background and Objectives Lack of knowledge in health care providers may lead to an increased risk and associated complications of blood transfusion. In pediatric nursing care, clinical conditions requiring transfusion are highly urgent. Therefore, assessment of knowledge and clinical decision-making was performed to ensure clinical competency of nursing students. Materials and Met...
متن کاملApplication of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کاملDetermination of the Most Important Diagnostic Criteria for COVID-19: A Step forward to Design an Intelligent Clinical Decision Support System
Background & Objective: Since the clinical and epidemiologic characteristics of coronavirus disease 2019 (COVID-19) is not well known yet, investigating its origin, etiology, diagnostic criteria, clinical manifestations, risk factors, treatments, and other related aspects is extremely important. In this situation, clinical experts face many uncertainties to make decision about COVID-19 progn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003